1 Introduction

This simulation study aims to compare frequentist and Bayesian approaches to perform treatment-control comparisons in platform trials using non-concurrent controls (NCC). In this simulation, we restricted the attention to trials with continuous endpoints.

2 Methods

We consider the following approaches:

  • Frequentist regression model that takes into account all data until the arm under study left the trial and adjusts for periods using a stepwise function. Note: this model is the extension of the one presented in the paper (Bofill Roig et al.), but with more than two arms.

  • The Bayesian Time Machine (Saville et al.), which uses a second-order Bayesian normal dynamic linear model (NDLM), takes into account all data until the investigated arm left the trial and includes covariate adjustment for time (separating the trial into buckets of pre-defined size) using a hierarchical model that smooths the control response rate over time.

  • The MAP prior approach (Schmidli et al.), where non-concurrent control data is used to obtain the MAP prior distribution for the control response in the concurrent period.

For comparative purposes, we also analyse the data using the separate approach (CC data only), as well as naive pooling of CC and NCC data.

3 Data generation

3.1 Trial design

We simulated platform trials evaluating the efficacy of \(K\) treatment arms compared to a shared control. Arm \(k\) (\(k>1\)) enters after \(d_k\) patients have been recruited to the trial and \(d_1=0\). Patients are allocated to treatment arms and control following 1:…:1 allocation. The duration of the trial is split into \(S\) periods, which are defined as time intervals bounded by any treatment either entering or leaving the trial.

3.2 Patient response

The continuous response \(y_j\) for patient \(j\) was generated according to:

\[E(y_j) = \eta_0 + \sum_{k=1}^K \cdot I(k_j=k) + f(j)\] where \(\eta_0\) and \(\theta_k\) are the response in the control arm and the effect of treatment \(k\).

The function \(f(j)\) denotes the time trend, whose strength is indicated by \(\lambda_{k_j}\) and which can have two patterns:

  • Linear time trend: \(f(j) = \lambda_{k_j} \cdot \frac{j-1}{N-1}\), where \(N\) is the total sample size in the trial

  • Stepwise time trend: \(f(j) = \lambda_{k_j} \cdot (c_j - 1)\), where \(c_j\) is an indicator of how many treatment arms have already entered the ongoing trial, when patient \(j\) was enrolled

  • Inverted-U time trend: \[f(j) = \begin{cases} \lambda \cdot \frac{j-1}{N-1} & \text{for } j \leq N_p \\ -\lambda \cdot \frac{j-N_p}{N-1} + \lambda \cdot \frac{N_p-1}{N-1} & \text{for } j > N_p \end{cases}\]

where \(N_p\) indicates the point at which the trend switches direction

4 Considered designs

We consider three designs of platform trials with \(K\) treatment arms that enter the trial in a staggered way. Each design corresponds to an objective:

  • In Design I, we explore the effect of the overlap between arms in trials with \(K=3\) experimental arms, in settings with equal time trends and different time trends.

  • In Design II, we investigate the impact of different entry times in trials with \(K=4\) arms. For this, we vary the entry time of arm \(3\) and consider equal time trends only.

  • In Design III, we explore the operating characteristics of more realistic platform trials. We consider \(K=10\) arms, and random time trends.

The considered trial designs are illustrated bellow.

In all three designs, we assume equal sample sizes of 250 in all treatment arms and 1:1:…:1 allocation ratio in each period, and consider different scenarios varying the overlaps between arms indicated by \(\mathbf{d} = (d_1,...,d_K)\). Furthermore, we consider linear, stepwise and inverted-U time trends, which are either equal across all arms, or different in arm 1 or arms 1 and 2 (in this case, only arm 3 is evaluated). For the Bayesian Time Machine, we used bucket sizes of 25 in all designs. Moreover, we assumed effect sizes of \(\theta_{i} = 0.25, i=1,...,K\) for the treatment-control comparisons under the alternative hypothesis. The chosen sample and effect sizes lead to 80% power for the treatment-control comparison using a separate analysis (one-sided t-test at 2.5% significance level).

4.1 Considered parameters for the Time Machine

In the remainder of this report, the following parameters were used for the Time machine:

  • prec_theta: 0.001
  • prec_eta: 0.001
  • prec_a: 0.001
  • prec_b: 0.001
  • bucket_size: 25

For the parameters tau_a and tau_b we consider the following options based on the expected and maximal jump in the control response between periods:

Assuming stepwise time trend with \(\lambda=0.15\) and \(d=250\)

Assumption Expected jump Maximal jump tau_a tau_b
Reasonable jump 1e-02 0.150 1.099121 0.0001099
Small jump 1e-03 0.015 109.912060 0.0001099
Large jump 1e+01 15.000 11.562213 1156.2213391

In plots comparing different methods to incorporate NCC, we use the assumption of a reasonable jump of the time trend. In plots showing the calibration of the Time Machine model, we present a comparison of the three assumed jump sizes (only for Design I).

4.2 Considered parameters for the MAP Prior

In the remainder of this report, the following parameters were used for the MAPPrior function when compating this approach to other methods:

  • opt: 2
  • n_samples: 1000
  • n_chains: 4
  • n_iter: 4000
  • n_adapt: 1000
  • robustify: TRUE
  • weight: 0.1
  • prior_prec_eta: 1
  • prior_prec_tau: 0.002

In plots showing the calibration of the MAP approach, we consider the following options for prior_prec_eta and prior_prec_tau:

  • prior_prec_eta \(\in \{ 0.001, 1 \}\)
  • prior_prec_tau \(\in \{ 2, 0.2, 0.002 \}\)

while the other parameters remain unchanged.

5 Results

5.1 Design I

  • 3 treatment arms
  • Equal and different time trends
  • Equidistant entry times
  • Vary entry times \(d_i\)

In the first design, we examine a platform trial with 3 treatment arms, where treatment arm \(i\) enters after every \(d_i = d \cdot (i-1)\) patients have joined the trial. We consider 5 options for \(d = (0, 125, 250, 375, 500)\), resulting in platform trials with different overlaps between arms, as illustrated below. We consider time trend that are equal across all arms, or that differ either in arm 1, or in arms 1 and 2. In cases with different time trends, only treatment arm 3 is evaluated.

  • Considered time trends:
Trt 1 Trt 2 Trt 3 = Control
\(\lambda\) \(\lambda\) \(\lambda\)
\(\lambda\) 0.1 0.1
\(\lambda\) \(\lambda\) 0.1

5.1.1 Overview

d = 0

d = 125

d = 250

d = 375

d = 500

5.1.2 No time trend

  • \(\lambda_k = 0\), \(\forall k\)

5.1.2.1 Type I error

5.1.2.2 Power

5.1.3 Equal time trend

5.1.3.1 Type I error w.r.t. d

  • \(\lambda_k = 0.15\), \(\forall k\)

5.1.3.2 Power w.r.t. d

  • \(\lambda_k = 0.15\), \(\forall k\)

5.1.3.3 Type I error w.r.t. \(\lambda\)

  • \(d = 250\)

5.1.3.4 Power w.r.t. \(\lambda\)

  • \(d = 250\)

5.1.4 Time Machine calibration

5.1.4.1 Type I error w.r.t. d

  • \(\lambda_k = 0.15\), \(\forall k\)

5.1.4.2 Power w.r.t. d

  • \(\lambda_k = 0.15\), \(\forall k\)

5.1.4.3 Type I error w.r.t. \(\lambda\)

  • \(d = 250\)

5.1.4.4 Power w.r.t. \(\lambda\)

  • \(d = 250\)

5.1.5 MAP Prior calibration

5.1.5.1 Type I error w.r.t. d

  • \(\lambda_k = 0.15\), \(\forall k\)

5.1.5.2 Power w.r.t. d

  • \(\lambda_k = 0.15\), \(\forall k\)

5.1.5.3 Type I error w.r.t. \(\lambda\)

  • \(d = 250\)

5.1.5.4 Power w.r.t. \(\lambda\)

  • \(d = 250\)

5.1.6 Different time trend in arm 1

5.1.6.1 Type I error w.r.t. d

  • \(\lambda_0 = \lambda_2 = \lambda_3 = 0.1\)
  • \(\lambda_1 = 0.25\)

5.1.6.2 Power w.r.t. d

  • \(\lambda_0 = \lambda_2 = \lambda_3 = 0.1\)
  • \(\lambda_1 = 0.25\)

5.1.6.3 Type I error w.r.t. \(\lambda\)

  • \(\lambda_0 = \lambda_2 = \lambda_3 = 0.1\)
  • \(d = 250\)

5.1.6.4 Power w.r.t. \(\lambda\)

  • \(\lambda_0 = \lambda_2 = \lambda_3 = 0.1\)
  • \(d = 250\)

5.2 Desing II

  • 4 treatment arms
  • Equal time trends
  • Non-equidistant entry times
  • Vary entry time \(d_3\) for arm 3

In the second design, we examine a platform trial with 4 treatment arms, where treatment arms 2 and 4 enter after every \(d_2=300\) and \(d_4=800\) patients have been recruited to the trial, respectively. In this case, there are 5 options for the timing of adding the third treatment arm \(d_3 = (300, 425, 550, 675, 800)\), as illustrated below.

5.2.1 Overview

d3 = 300

d3 = 425

d3 = 550

d3 = 675

d3 = 800

5.2.2 Equal time trend

  • \(\lambda_k = 0.15\), \(\forall k\)

5.2.2.1 Type I error

5.2.2.2 Power

5.3 Design III - I

  • 10 treatment arms
  • Different time trends
  • Evaluated treatment arm: 10

Scenario III-I consists of 10 treatment arms, where treatment arm \(i\) enters after every \(300 \cdot (i-1)\) patients have been recruited to the trial. In this case, only treatment arm 10 is evaluated and has no time trend present, just like the control group (\(\lambda_0 = \lambda_{10} = 0\)). The time trend in the remaining treatment arms is varied with \(\lambda_1=\lambda_2=\ldots=\lambda_9\).

5.3.1 Different time trend

5.3.1.1 Type I error - all arms under H0

5.3.1.2 Type I error - arms 1-9 under H1

5.3.1.3 Power

5.4 Design III - II

  • 10 treatment arms
  • Different (random) time trends
  • Evaluated treatment arm: 10

Scenario III-II consists of 10 treatment arms, where treatment arm \(i\) enters after every \(300 \cdot (i-1)\) patients have been recruited to the trial. In this case, only treatment arm 10 is evaluated, while its time trend is varied and is equal to the control group (\(\lambda_0 = \lambda_{10}\)). The time trend in the remaining treatment arms is sampled from \(\lambda_i \sim N(\lambda_0, 0.5), \forall i \in \{1,\ldots,9\}\).

5.4.1 Different (random) time trend

5.4.1.1 Type I error

5.4.1.2 Power